1. Identity statement | |
Reference Type | Conference Paper (Conference Proceedings) |
Site | sibgrapi.sid.inpe.br |
Holder Code | ibi 8JMKD3MGPEW34M/46T9EHH |
Identifier | 8JMKD3MGPAW/3S4ELD8 |
Repository | sid.inpe.br/sibgrapi/2018/10.24.00.46 |
Last Update | 2018:10.24.00.46.42 (UTC) felipe.duarte@itau-unibanco.com.br |
Metadata Repository | sid.inpe.br/sibgrapi/2018/10.24.00.46.42 |
Metadata Last Update | 2022:05.18.22.18.35 (UTC) administrator |
Citation Key | KuboNazAguOliDua:2018:UsUNPr |
Title | The usage of U-Net for pre-processing document images |
Format | On-line |
Year | 2018 |
Access Date | 2024, May 01 |
Number of Files | 1 |
Size | 906 KiB |
|
2. Context | |
Author | 1 Kubo, Diandra Akemi 2 Nazare, Tiago Santana de 3 Aguirre, Priscila Louise Ribeiro 4 Oliveira, Bruno Domingues 5 Duarte, Felipe Simões Lage Gomes |
Affiliation | 1 Data Science Team - Itau Unibanco 2 Data Science Team - Itau Unibanco 3 Data Science Team - Itau Unibanco 4 Data Science Team - Itau Unibanco 5 Data Science Team - Itau Unibanco |
Editor | Ross, Arun Gastal, Eduardo S. L. Jorge, Joaquim A. Queiroz, Ricardo L. de Minetto, Rodrigo Sarkar, Sudeep Papa, João Paulo Oliveira, Manuel M. Arbeláez, Pablo Mery, Domingo Oliveira, Maria Cristina Ferreira de Spina, Thiago Vallin Mendes, Caroline Mazetto Costa, Henrique Sérgio Gutierrez Mejail, Marta Estela Geus, Klaus de Scheer, Sergio |
e-Mail Address | felipe.duarte@itau-unibanco.com.br |
Conference Name | Conference on Graphics, Patterns and Images, 31 (SIBGRAPI) |
Conference Location | Foz do Iguaçu, PR, Brazil |
Date | 29 Oct.-1 Nov. 2018 |
Publisher | Sociedade Brasileira de Computação |
Publisher City | Porto Alegre |
Book Title | Proceedings |
Tertiary Type | Industry Application Paper |
History (UTC) | 2018-10-24 00:46:42 :: felipe.duarte@itau-unibanco.com.br -> administrator :: 2022-05-18 22:18:35 :: administrator -> :: 2018 |
|
3. Content and structure | |
Is the master or a copy? | is the master |
Content Stage | completed |
Transferable | 1 |
Keywords | #deep-learning #computer-vision #image-processing |
Abstract | When processing documents in real-world scenarios, it is common to deal with artifacts that may hamper document analysis, such as stamps, noise and strange backgrounds. Aiming to mitigate these problems, we propose the use of U-Net, a very successful biomedical image segmentation network, for handwritten and machine text segmentation. In order to do so, we trained a model for each type of text. One of the main advantages presented is that the models are trained on artificial data, avoiding the wearisome task of data labeling. For the machine text segmentation model, we test its impacts on both word and character recognition when combined with the Tesseract OCR model. For the handwritten segmentation model, we present qualitative results. Initial experiments indicate that both models are able to improve results in their respective applications. |
Arrangement | urlib.net > SDLA > Fonds > SIBGRAPI 2018 > The usage of... |
doc Directory Content | access |
source Directory Content | there are no files |
agreement Directory Content | |
|
4. Conditions of access and use | |
data URL | http://urlib.net/ibi/8JMKD3MGPAW/3S4ELD8 |
zipped data URL | http://urlib.net/zip/8JMKD3MGPAW/3S4ELD8 |
Language | en |
Target File | sibgrapi_pi_cv.pdf |
User Group | felipe.duarte@itau-unibanco.com.br |
Visibility | shown |
Update Permission | not transferred |
|
5. Allied materials | |
Mirror Repository | sid.inpe.br/banon/2001/03.30.15.38.24 |
Next Higher Units | 8JMKD3MGPAW/3RPADUS |
Citing Item List | sid.inpe.br/sibgrapi/2018/09.03.20.37 8 |
Host Collection | sid.inpe.br/banon/2001/03.30.15.38 |
|
6. Notes | |
Empty Fields | archivingpolicy archivist area callnumber contenttype copyholder copyright creatorhistory descriptionlevel dissemination doi edition electronicmailaddress group isbn issn label lineage mark nextedition notes numberofvolumes orcid organization pages parameterlist parentrepositories previousedition previouslowerunit progress project readergroup readpermission resumeid rightsholder schedulinginformation secondarydate secondarykey secondarymark secondarytype serieseditor session shorttitle sponsor subject tertiarymark type url versiontype volume |
|